NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ensemble automated approaches for producing high‐quality herbarium digital records

https://doi.org/10.1002/aps3.11623

Guralnick, Robert P; LaFrance, Raphael; Allen, Julie M; Denslow, Michael W (November 2024, Applications in Plant Sciences)

Abstract PremiseOne of the slowest steps in digitizing natural history collections is converting labels associated with specimens into a digital data record usable for collections management and research. Here, we address how herbarium specimen labels can be converted into digital data records via extraction into standardized Darwin Core fields. MethodsWe first showcase the development of a rule‐based approach and compare outcomes with a large language model–based approach, in particular ChatGPT4. We next quantified omission and commission error rates across target fields for a set of labels transcribed using optical character recognition (OCR) for both approaches. For example, we find that ChatGPT4 often creates field names that are not Darwin Core compliant while rule‐based approaches often have high commission error rates. ResultsOur results suggest that these approaches each have different strengths and limitations. We therefore developed an ensemble approach that leverages the strengths of each individual method and documented that ensembling strongly reduced overall information extraction errors. DiscussionThis work shows that an ensemble approach has particular value for creating high‐quality digital data records, even for complicated label content. While human validation is still needed to ensure the best possible quality, automated approaches can speed digitization of herbarium specimen labels and are likely to be broadly usable for all natural history collection types.
more » « less
Full Text Available
Humans in the loop: Community science and machine learning synergies for overcoming herbarium digitization bottlenecks

https://doi.org/10.1002/aps3.11560

Guralnick, Robert; LaFrance, Raphael; Denslow, Michael; Blickhan, Samantha; Bouslog, Mark; Miller, Sean; Yost, Jenn; Best, Jason; Paul, Deborah L; Ellwood, Elizabeth; et al (January 2024, Applications in Plant Sciences)

Abstract PremiseAmong the slowest steps in the digitization of natural history collections is converting imaged labels into digital text. We present here a working solution to overcome this long‐recognized efficiency bottleneck that leverages synergies between community science efforts and machine learning approaches. MethodsWe present two new semi‐automated services. The first detects and classifies typewritten, handwritten, or mixed labels from herbarium sheets. The second uses a workflow tuned for specimen labels to label text using optical character recognition (OCR). The label finder and classifier was built via humans‐in‐the‐loop processes that utilize the community science Notes from Nature platform to develop training and validation data sets to feed into a machine learning pipeline. ResultsOur results showcase a >93% success rate for finding and classifying main labels. The OCR pipeline optimizes pre‐processing, multiple OCR engines, and post‐processing steps, including an alignment approach borrowed from molecular systematics. This pipeline yields >4‐fold reductions in errors compared to off‐the‐shelf open‐source solutions. The OCR workflow also allows human validation using a custom Notes from Nature tool. DiscussionOur work showcases a usable set of tools for herbarium digitization including a custom‐built web application that is freely accessible. Further work to better integrate these services into existing toolkits can support broad community use.
more » « less
Full Text Available
From spectators to stewards: Transforming public involvement in natural history collections

https://doi.org/10.3897/nhcm.1.138247

von_Konrat, Matt; Rodriguez, Yarency; Bailey, Colleen; Gwilliam_III, Gilbert F; Christian, Christine; Aguero, Blanka; Ahn, June; Albion, Zoe; Allen, James R; Bailey, Colin; et al (December 2024, Natural History Collections and Museomics)

A comprehensive overview of volunteer-driven public programs focused on activities to enhance natural history collections (NHCs) is provided. The initiative revolves around the WeDigBio events and the Collections Club at the Field Museum, aiming to deepen the public’s connection with scientific collections, enhance participatory science, and improve data associated with natural history specimens. The implementation and journey of these programs are outlined, including surveys conducted from 2015 through 2021 to gauge participant motivation, satisfaction, and the impact of these events on public engagement with NHCs. Results show trends in on-site and virtual volunteer participation over the years, especially during the peak period of the COVID-19 pandemic. The majority of participants expressed high satisfaction, indicating a willingness to continue participating in similar activities. The surveys revealed a shift towards more altruistic motivations for participation over time, with increased emphasis on supporting the Field Museum and contributing to the scientific community. The success of participatory science events demonstrates the potential of volunteer-driven programs to contribute meaningfully to the preservation, digitisation, and understanding of biodiversity collections, ultimately transforming spectators into stewards of natural history. From 2015 to present participants celebrate a significant milestone, with over a thousand community scientists contributing to the inventorying, collection care, curation, databasing, or transcription of 286,071 specimens, objects or records. We also discuss accuracy and quality control as well as a checklist and recommendations for similar activities.
more » « less
Full Text Available
Digitization protocol for scoring reproductive phenology from herbarium specimens of seed plants

https://doi.org/10.1002/aps3.1022

Yost, Jennifer M.; Sweeney, Patrick W.; Gilbert, Ed; Nelson, Gil; Guralnick, Robert; Gallinat, Amanda S.; Ellwood, Elizabeth R.; Rossington, Natalie; Willis, Charles G.; Blum, Stanley D.; et al (February 2018, Applications in Plant Sciences)

Search for: All records